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Abstract 

Process mining is a technique that performs an automatic analysis of business processes from a 
log of events with the promise of understanding how processes are executed in an organisation. 

Several models have been proposed to address this problem, however, here we propose a dif¬ 
ferent approach to deal with uncertainty. By uncertainty, we mean estimating the probability of 
some sequence of tasks occurring in a business process, given that only a subset of tasks may be 
observable. 

In this sense, this work proposes a new approach to perform process mining using Bayesian Net¬ 
works. These structures can take into account the probability of a task being present or absent in the 
business process. Moreover, Bayesian Networks are able to automatically learn these probabilities 
through mechanisms such as the maximum likelihood estimate and EM clustering. 

Experiments made over a Loan Application Case study suggest that Bayesian Networks are 
adequate structures for process mining and enable a deep analysis of the business process model that 
can be used to answer queries about that process. 


1 Introduction 


Process mining is a technique that enables the automatic analysis of business processes based on event 
logs. Instead of designing a workflow, process mining consists in gathering the information of the tasks 
that take place during the workflow process and storing that data in structured formats called the event 
logs (van der |2011| ). While gathering this information, it is assumed that (1) each event refers to a task in 
the business process, (2) each event is associated to an instance of the workflow and (3) since the events 
are stored by their execution time, it is assumed that they are sorted ( van der Aalst et al .[2004[ ). 

During the last decade, process mining has been growing a lot of attention in the scientific com¬ 
munity due to its promise to provide techniques for process discovery that will lead to an increase of 
productivity and to the reduction of costs ( van der Aalst & de Medeiros |2005| ). 
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Process modelling can be seen as the techniques to graphically represent a business process. This 
graphical representation describes dependencies between activities that need to be executed together in 
order to fulfil a business target ( |Weske|2012| ). 

Since in process mining the order of the events is taken into consideration, there are already many 
models that can be directly applied to represent the workflow. Some of those models include Markov 
Chains ( [Ferreira et al.||2007} |Rebuge & Ferreira||2012| ), Petri Nets ( |van der Aalst||1998] ), Neural Net¬ 
works ( [Cook & Wolf|1998| and BPMN (van der (2011[ ). However, Markov Chains and Petri Nets are the 
models that are most used in the literature of process mining (Tiwari et a l.|2008[ ). 

In this work, it is proposed an alternative representation of business process by using Bayesian Net¬ 
works. A Bayesian Networks can be defined as an acyclic directed graph in which each node represents 
a random variable and each edge represents a direct influence from the source node to the target node 
(conditional dependencies) (Spirtes et aL[ l|200l[ ). They differ from Markov Chains, because of their 
cycle-free and directed structure. Moreover, Bayesian Networks have the advantage of dealing with 
uncertainty differently from Markov Chains. In the latter, business processes are modelled as a chain of 
events that are observed to occur. Under a Bayesian Network perspective, this does not apply: each task 
can either be present or absent in the business process. Therefore, it is possible to perform special anal¬ 
ysis that will enable the computation of the probability of some task of the business process occurring, 
given that we do not know which tasks have already been performed (Pear l|2009| . 

With this research work, we argue that the capabilities of Bayesian Networks provide a promising 
technique to model business processes, to perform analysis regarding risk management, cost reduction, 
finding irrelevant / repetitive tasks, etc. 

The outline of this work is as follows. Section [2] presents a brief summary of Markov Chains. 
Section[3]makes an introduction to Bayesian Networks. It shows how to compute probabilistic inferences 
and presents some learning techniques that are used to automatically learn conditional probabilities in 
Bayesian Networks. Section |4] presents how Bayesian Networks can be applied in the realm of process 
mining. This section demonstrates how one can define the structure of a Bayesian Network and how 
one can perform automatic learning. Section [5] presents a case study in which we apply the proposed 
network. Finally, Section [6] summarises the current work, presents the main conclusions achieved and 
some directions for future work. 
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2 Markov Chains 


A Markov Chain is defined by a state space Val{X ) and a model that defines, for every state x G Val{X ) 
a next-state distribution over Val(X). More precisely, the transition model T specifies for each pair of 
states x,x' the probability t(x —)• x 7 ) of going from state x to x' (Koller & Friedman 2009 ). 


0.9 



Figure 1: Example of a Markov Chain 


In Markov Chains, the transition probability matrix must be stochastic, that is, each row of the matrix 
must sum to one. Matrix [I] represents the transition matrix of the Markov Chain in Figure [I] 
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Suppose that one is in state B at time n. In order to compute the evolution of the system forn + 1, 
one just needs to perform a matrix multiplication between the current state and the transition probability 
matrix. The current state B will be encoded as vector [0 10]. 
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The calculations in formula [2] show that the probability from transiting from state B —>> A is 0.15. 
The probability of transiting from B -» B is 0.8. And the probability of transiting from state B —C is 
0.05. 


Moreover, if one wishes to compute the probability of the sequence A —> B —> B C, one would 
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need to perform the following calculations: 


Pr(A ~^B^B^C)= Pr(A -> B)Pr(B -> -»■ C) = 0.075 x 0.8 x 0.05 = 0.003 (3) 


3 Bayesian Networks 

Bayesian Networks are directed acyclic graphs in which each node represents a different random variable 
from a specific domain and each edge represents a direct influence from the source node to the target 
node (Pearl [T997] ). The graph represents independence relationships between variables and each node 
is associated with a conditional probability table (CPT) which specifies a distribution over the values 
of a node given each possible joint assignment of values of its parents. The full joint distribution of a 
Bayesian Network, where X is the list of variables, is given by ( Russell & Norvig 2009 ): 

Pr c (X x ,...,X n ) = f\Pr(Xi\Parents(Xi)) (4) 

i= 1 

The formula for computing classical exact inferences on Bayesian Networks is based on the full joint 
distribution (Equation]?]). Let e be the list of observed variables and let Y be the remaining unobserved 
variables in the network. For some query X , the inference is given by: 


Pr c (X\e) = aPr c (X,e) = a 


Y,Prc(X,e,y) 

ye y 


Where 


1 

Lxex Pr c(X = x,e) 


(5) 


The summation is over all possible y, i.e., all possible combinations of values of the unobserved variables 
y. The a parameter, corresponds to the normalisation factor for the distribution Pr(X\e) ([Russell & 
Norvig 2009 ). 


3.1 Example of Application 

Consider the Bayesian Network in Figure [2] Suppose that we want to determine the probability of 
raining given that we know that the grass is wet. In order to perform such inference on a Bayesian 


4 















Figure 2: Example of a Bayesian Network. 


Network, one can use Equation [5] in the following way: 

Pr(R = T \W = T) =aPr(R = T) x^ Pr( S = s\ R = T ) x Pr(W = T | S = s, R = T ) (6) 

sE:S 


Pr(R = T\W = T) = a0.2x [Pr(S = T\R = T)Pr(W = T\S = T,R = T)+ 

+Pr(S = F\R = T)Pr(W = T\S = F,R = T)} 
Pr(R = T\W = T) = a0.2x [0.01 x 0.99 + 0.99 x 0.8] = a 0.1604 = 0.3577 


(7) 

( 8 ) 


Given that Bayesian Networks are based on the Naive Bayes rule, one needs to normalize the final 
probabilities by a factor a. This normalisation factor a corresponds to: 


Pr(R = T\W = T) +Pr(R = F\W = T) w 

So, in order to compute a, one also needs to compute the probability of not raining given that the grass 
is wet, Pr( R = F | W = T ): 


Pr(R = F \W = T) =aPr(R = F)xY,Pr{S = s\R = F)xPr(W = T \ S = s, R = F) (10) 

s€S 


Pr(R = F \W = T) = a0.8x [Pr(S = T\R = F)Pr(W = T\S = T,R = F)+ 

+Pr(S = F\R = F)Pr(W = T\S = F, R = F )] 


( 11 ) 
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Pr(R = F \W = T ) = a 0.8 x [ 0.4 x 0.9 + 0.6 x 0 ] = a 0.288 


( 12 ) 


Going back to the normalisation factor in Equation[9j one can substitute Pr(R — T\W — T) by the result 
in Equation [8] and Pr(R = F\W = T) by the results in Equation [T2| 


a = 


1 


1 


0.1604 + 0.288 0.4484 


(13) 


Now that we have computed the normalisation factor, the final probabilities are: 


Pr( R = T \W = T ) = a 0.1604 = 0.3577 


(14) 


Pr( R = F \W = T ) = a 0.288 = 0.6423 


(15) 


3.2 Learning in Bayesian Networks 

There are two main approaches to build a Bayesian Network. One is to construct the network by hand 
and to use the knowledge of an expert to estimate the conditional probability tables. The second is to 
use statistical models to automatically learn these probabilities ( [Roller & Friedman|2009| ). 

Estimating the conditional probabilities by hand with the knowledge of an expert is problematic for 
several reasons. In some situations, the network is so big that it is almost impossible for the expert to 
make a reliable assignment of the probabilities to the random variables. Moreover, in many situations, 
the distribution of the data varies according to its application and through time. This makes it impossible 
for an expert to reliably estimate the probabilities associated to the random variables of the Bayesian 
Network. 

Statistical models, on the other hand, offer a mechanism to automatically learn a model that repre¬ 
sents the probability distribution of some population. 

According to the situation that one is modelling, one can have a fully observed dataset or have an 
incomplete dataset (or partially observed). For the scope of this work, we will only address the problem 
of learning in Bayesian Networks with a fully observed dataset and a known graphical structure. The 
data are considered fully observed if on each of the training instances there is a full instantiation to all 
the random variables of our sample space ( |Murphy|2012| ). 
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3.2.1 Maximum Likelihood Estimation in Bayesian Networks 


The maximum likelihood estimation is a statistical method that assumes that data follows a Gaussian 
probability distribution. The mean and the variance of the probability distribution can be estimated by 
only knowing a partial sample of the dataset ( |Bishop|2007| ). 

Suppose that we have a Bayesian Network just like specified in Figure[3] This network is parameter¬ 
ized by a parameter vector 6 which specifies the parameters for the conditional probability distribution 
of the network. 



Figure 3: Example of a Bayesian Network structure with unspecified conditional probability tables. 

The training instances regarding Figure [3] consist in a tuple of the form (x [m ], y [m ]), where x is an 
instance of the random variable X , y is an instance of the random variable Y and m is the m t / t training 
example from the training dataset D of size M. 

The likelihood function is given by: 


M 


L(6 : D ) = P[ Pr(x[m] ,y[m] : 0) 


(16) 


m= 1 


Since in a Bayesian Network we can specify a full joint probability distribution Pr(x [m] ,y[m\ : 6) 
by the chain rule, then, Equation |T6] becomes: 


L(6 : D ) = P[ Pr(x[m] : 6 x )Pr(y[m ] \x[m} : 6 Y \ X ) 


(17) 


m 



(18) 


Equation [18] shows that the likelihood function can be decomposed into two separate terms. If we 
had N random variables, then Equation [18] would also have N terms. Each of these terms is called a 
local likelihood function and can estimate how well a variable can predict its parents. 
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Moreover, one can expand the second term of Equation [18] for each instance of x in the following 


way: 


Y[Pr(y M \x M : Oy\x) = FI Pr O’ M l x M : e Y\x,rue ) n H l x H : e Y\x fal J (19) 

m m:x[m]=x rrwe m:x[m] =Xf aise 

Going back to the simple Bayesian Network in Figure [3] if we analyse the first term of Equatioifl9l 
we can see that it refers to the number of instances of the training data in which x = true. This gives us 
two sets: x = true,y = true and x = true,y = false. Equation [20] discriminates these instances. 


n Pr{ y M l X M : e Y\x,rue ) = d y=true\x =true • ®y=false \x=true (20) 

m:x[m]=x true 

Then, Equation [20] becomes: 


®y=false\x=true 


count ((x — true,y — false)) 


count ((x — true,y = false)) + count ((x — true,y = true)) 


®y=false\x=true 


count ((x = true,y = false)) 


( 21 ) 


( 22 ) 


count ((x = true)) 

From Equation [22] we can see that the maximum likelihood estimate for a Bayesian Network with 
a known structure and fully observed data consists in simply counting how many times each of the 
possible assignments of X and Y appear in the training data. In order to obtain a probability value, we 
normalize this score by counting the total number of instances that class X appears. 


3.3 Samlam 

Samlam - Sensitivity ; Analysis , Modeling, Inference and More - is a tool that enables the graphical mod¬ 
eling of Bayesian Networks. It was developed by the Automated Reasoning Group form the University 
of Californi43 

Samlam is composed of a graphical interface and a reasoning engine. The graphical interface pro¬ 
vides an easy way to model Bayesian Networks by specifying the random variables as nodes, causal 
connections as edges and the respective conditional probability tables. The reasoning engine, on the 
other hand, can perform classical inferences over the plotted Bayesian Network, make parameter esti¬ 
mations by learning mechanisms, sensitivity analysis, etc. For the scope of this work, only the classical 

1 http://reasoning.cs.ucla.edu/samiam/ 
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inference and the learning mechanisms will be necessary. 

Examples of Samlam’s graphical interface are given by Figures |4] to [9j 



Figure 4: Samlam Figure 5: Example of Figure 6: Example of 

representation of the Bayesian Samlam’s inference engine: Samlam’s inference engine: 

Network of Figure[5J Pr(R\W = T), Pr(S\W = T). Pr(R\S = T), Pr(W\S = T ). 


In Figure [4] it is presented the Bayesian Network from Figure [2] under the Samlam graphical inter¬ 
face. The marginal probabilities for each node are automatically computed as soon as the user builds 
the Bayesian Network. Figure [4] shows that: Pr(R — T) — 0.2, Pr(R — F) = 0.8, Pr(S = T) = 0.3220, 
p r (S = F)= 0.6780, Pr(W = T)= 0.4484, Pr(W = F) = 0.5516. 

Figure [^represents a graphical representation of the inference that was manually computed in Equa¬ 
tions [8] and [12] The red markers represent variables which are observed. That is, variables, which have 
occurred. They can be seen as the conditions of probabilities. For instance, in the manually computed 
probability in Equation [6] the observed variable was the condition W = T, that is, we are asking the 
probability of Raining given that it was observed that the grass was wet. 



Figure 7: Example of 
Samlam’s inference engine: 
p r (R\W = T,S=T). 


Figure 8: Example of 
Samlam’s inference engine: 
Pr(W\S = T,R = T). 


Figure 9: Example of 
Samlam’s inference engine: 
Pr(S\R = T), 

Pr(W | R = T). 


For large Bayesian Networks, the inference process becomes very heavy and hard to be computed 
manually. Therefore, Samlam provides an easy interface that automatically performs such heavy opera¬ 
tions. 

In process mining, event logs are usually associated with a large amount of tasks, which can be 
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mapped into nodes of a Bayesian Network. Consequently, for the scope of this work, we chose the ca¬ 
pabilities of Samlam to automatically compute inferences related to the probability of certain sequences 
of tasks occurring. This mechanism will be more detailed in Section [4~2] of this work. 


4 Bayesian Networks for Process mining 


Probabilistic graphical models, such as Bayesian Networks, are usually used for probabilistic inferences, 
that is, asking queries to the model and receiving answers in the form of probability values. 

Under the realm of process mining, Bayesian Networks can represent activities as nodes (i.e. random 
variables) and the edges between activities can be seen as transitions between these tasks. From this 
structure, it is possible to automatically learn the conditional probability tables from a complete log 
of events using the Maximum Likelihood Estimations (Section [3.2.1| ). If the log is incomplete, then a 
Bayesian Network can also automatically learn and estimate the probability tables through the usage of 
EM Clustering, just like used in the work of Bobek et al. ( |2013| ), who developed a Bayesian Network to 
recommend business processes. 

In the literature, business processes that are learnt from event logs are usually represented by either 
Markov Chains or Petri Nets (Weske] [2012| ). In this work, however, we propose another approach to 
model business processes using Bayesian Networks. The reason why we do this is concerned with the 
fact that Bayesian Networks can deal with uncertainty more easily. 

Bayesian Networks provide advantages in situations where we do not know if some task has occurred 
and we need to determine the probability of the process terminating or the probability of the process 
reaching some other task. Therefore, these structures provide more insights when there are high levels 
of uncertainty when compared to Markov Chains. 


4.1 Defining the Strucuture 

Another advantage of Bayesian Networks is that they allow the direct representation of business process 
diagrams by capturing the direct dependencies between tasks. However, they do not allow an explicit 
representation of cycles, because Bayesian Networks are directed acyclic graphs. To represent a cycle, 
in a Bayesian Network, one would need to create many instances of the same node, which is intractable 
to perform inferences, since the inference problem is NP-Complete (Figures [T0| and [IT]) . 

In this work, in order to eliminate cycles from the log of events, we used an heuristic that would 
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Figure 10: Example of a representation of a 
Bayesian Network with cycles. 


Figure 11: Example of a representation of a 
Markov Chain with cycles. 


choose the most probable transitions between nodes. For instance, suppose that there is a transition from 
nodes A —> B that occurred 900 times. Suppose also that there is a transition from nodes B —> A that 
occurred 100 times. Following the proposed heuristic, we would only represent the Bayesian Network 
with the transition A —» B. Figures [l2| and |T3] illustrates this example. 


900 



900 



Figure 12: Example of a Markov Chain with a Figure 13: Conversion of the Markov Chain to a 
cycle. Bayesian Network by removing the weakest edge. 


Another structure that Bayesian Networks cannot represent directly is concerned with mutual exclu¬ 
sion. Two events are mutually exclusive if they cannot occur at the same time. Bayesian Networks can 
capture mutually exclusive events through the notions of independence by manually adding new edges 
to the network. For instance, consider the business process represented by the Bayesian Network in 
Figure [l4j Nodes B and C represent the end of the process, while node A represents a task that begins 
the process. In this situation, and following the semantics of the business process, it is required that 
nodes B and C become mutually exclusive. That is, the process flow can only end in one of these nodes 
and not on both of them at the same time. 

As one can see in Figures |T5] and [T6| the Bayesian Network cannot represent this mutual exclusion. 
When computing Bayesian Inferences, all nodes depend on each other. Therefore, in order to semanti¬ 
cally represent node B cannot occur at the same time as node C, one needs to add an extra edge between 
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Figure 14: Example of a Bayesian Network with no mutually exclusive nodes. 



Figure 15: Example of a Bayesian Network 
with no mutually exclusive nodes. 



Figure 16: Example of a Bayesian Network 
with no mutually exclusive nodes. 


B C. This additional edge will create a new dependency between these nodes. One can manually 
configure the conditional probability table of node C to represent this mutually exclusion: when node 
B is set to true , then the probability of occurring C is zero and vice-versa. The mutual exclusion of the 
Bayesian Network in Figure [14] is illustrated in Figures [FT] to [19] 



Figure 17: Example of a Bayesian Network with mutual exclusion. 


Note that, in Figures[l7]to[T9l the probability of node C occurring when nothing is observed changed 
when compared to the Bayesian Network of Figure [14] This happened, because of the extra edge that 
was added in the later Bayesian Network, which ended up changing the configurations of the conditional 
probability tables and, consequently, final probability values. 
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Figure 18: Example of a Bayesian Network 
with mutual exclusion. 
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Figure 19: Example of a Bayesian Network 
with mutual exclusion. 


4.2 Samlam: Designing a Bayesian Network 


Samlam provides an intuitive interface for constructing Bayesian Network. There are two modes in 
Samlam: the query mode (for learning and inferences) and the edit mode (for network structure and 
definition of conditional probabilities). When Samlam is started, the edit mode appears by default. 
Figure [20] describes the general edit mode interface. 

File Edit Mode Query Tools View Preferences Window Help 
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Figure 20: Samlam’s edit mode default interface. 
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The interface enables the creation/removal of nodes and the creation/removal of edges between 
nodes. For each node created, there will be a configuration window that can be accessed when the 
node is double-clicked. In this window, one must specify a unique identifier for the node and a name to 
be displayed in the Samlam interface. Additionally, one also needs to specify which states the node can 
have. For the scope of this work, we will only have binary random variables, so each node will have 
exactly two states: one representing the occurrence of the random variable and another representing its 
absence. 

The conditional probability table can be accessed by clicking the tab Probailities. A windows, 
similar to the one presented in Figure [21} will appear. 


eoo 
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Figure 21: Samlam’s interface to assign conditional probabilities to random variables. 


In this window, a user can manually specify the conditional probabilities of the random variable, by 
default, Samiam fills these tables using a normal probability distribution, that is, each instance of each 
node has the same probability of occurring (Pr = 0.5 ). 

The buttons Complement can be used to automatically assign the last probability value of the table. 
This takes into account the constraint that the probabilities of an event must sum to one. This way, the 
user can only manually specify n— 1 entries of the table. Samlam computes the remaining probability 
by subtracting that value with 1: Pr(N = \n\) = 1 — Y}n=\ = n). 

The button Normalize normalizes all the entries of the conditional probability table. 


4.3 Learning 

Given a log of events and a graphical structure, Samlam is able to find a statistical model that can 
automatically estimate the conditional probability tables of the given Bayesian Network. This learning 
process can be computed using the Maximum Likelihood Estimation (Section |3.2.1| ) if the log of events 
is complete or using the EM Clustering algorithm if the log of events is incomplete ( Bishop 2007| ). 

In the scope of this work, since we were given a complete event log, the process of filling the 
conditional probability tables was given by the maximum likelihood estimation, that is, by counting the 
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number of times each instance of the log of events was present and then by normalizing to obtain a 
probability value. 


Samlam can automatically do this in the query mode. In the main Samlam interface, one can select 
the query mode just like presented in Figure [22] To go into the learning menu, one needs to find the 
option EM Learning (Figure [23]). 
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Figure 22: Entering in query mode. 
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Figure 23: Entering in the learning menu. 


Under the EM Learning menu, the user is presented with a window that asks for a training file, 
a probability threshold, the maximum number of iterations that the algorithm should perform and if 
the learning algorithm should ignore entries that lead to divisions by zero. Figure [24] illustrates these 
options. 


© O O EM Learning 

Tools 



I Close | 

Figure 24: Samlam learning menu. 


In Figure [24] the field Max iterations corresponds to the total number of iterations that the EM 
Clustering should perform in case the algorithm does not converge. For the scope of this work, this entry 
is irrelevant since we are dealing with fully observed log of events. Consequently, the EM Clustering 
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will collapse to the Maximum Likelihood Estimate. 


The field Log-likelihood threshold is also used in the scope of the EM Clustering. This threshold 
specifies that the algorithm will converge when the change in the log-likelihood function falls bellow a 
certain threshold. It is a common practice in the literature to set this value to 0.05 (Bishop 2007 Koller 
& Friedman |2009 ). 


The option Use bias to prevent divisions by zero should always be used, otherwise the Maximum 
Likelihood Estimate formula will try to perform a division by zero when it tries to compute the proba¬ 
bility of an instance that does not exist in the training set. 

In process mining, a training set consists in a portion of the log of events that is used to fit (train) a 
model for prediction of values. In the scope of this work, a training set will consist of 70% randomized 
entries of the log of events. The format of the training file contains the names of all random variables 
(nodes) in the first line. The remaining lines of the file correspond to the instances of the nodes that 
are specified in the log of events. In this work, we modeled binary random variables with the instances 
present to represent the occurrence of a task in the business process and absent to represent the non¬ 
existence of the task. Figure [25] shows the log of events (left) and the conversion of one instance of the 
log of events into a training file with the Samian format (right). 
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112 2011-10-01T00:39:37 

10862 2011-10-01T11:42:43 

10862 2011- 10-01T11:45 09 

10629 2011-10-13T10:37:29 

10629 2011-10-13T10:37:29 

9 

10 

173691 

173691 

5000 A_SUBMITTED COMPLETE 

5000 A_PARTLYSUBMITTED COMPLETE 

112 2011-10-01108:08:58 

112 2011-10-01108:09:02 

11 

12 

173691 

173691 

5000 A_PREACCEPTED COMPLETE 

5000 A.DECLINED COMPLETE 

112 2011-10-01108:09:56 

10862 2011-10-01T14:33:54 


training.dat 


A_ACCEPTED,A_ACTIVATED,A_APPROVED,A_CANCELLED,A_DECLINED,A_FINALIZED,A_PARTLYSUBMITTED,A_PREACCEPTEO,A_REGISTERED l A_SUBMITTED 
jj^es£nt u £resent L £resent J absent I obserrt J j^ 1 

absent,absent,absent,absent,present,absent,present,present,absent,present | 

present,absent,absent,present,absent,present,present,present.absent,present 
absent,absent,absent.absent,present.absent,present.absent,absent,present 
absent,absent,absent.absent,present.absent.present,present.absent.present 
absent,absent,absent,absent,present.absent,present.absent,absent,present 
absent,absent,absent.absent,present,absent,present,present,absent,present 


Figure 25: Entering in the learning menu. 


After Samlam learns the conditional probability tables, it is necessary to correct some semantics of 
the network. More specific the inclusion of mutually exclusive relationships. For instance, Figure [26] 
presents a conditional probability table that was automatically learned by Samlam. As one can see, 
when the node A_PARTLYSUBMITTED is absent, Samlam did not update the normal probability dis¬ 
tribution, so the probabilities 0.5 remained in the conditional probability table. This means that there 
were no events in the log that did not have an instance of the A_PARTLYSUBMITTED node. This 
happens, because in process mining, the activities that are performed are usually mutually exclusive, 
unless a special structure is used to say the contrary. In order to correct these probabilities, such that the 
mutual exclusion is captured, one just needs to fill the conditional probability table just like illustrated 
in Figure [27] When the preceding node is absent , then the posterior nodes should also become absent. 
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A PREACCEPTED Properties 

Conditional Probability Table 



A PARTLY... | present 

absent 

present 10,5659388646268209 

o,s 

absent 0,434061135371179 

0,5 


Figure 26: Learned conditinal probability table. 


A PREACCEPTED Properties 

Conditional Probability Table 

_Sj 


A PARTLY... | present 

|absent 

present |0,5659388646288209 

0,0 

absent 0,434061135371179 

1.0 


Figure 27: Corrected conditional probability table 
denoting mutual exclusion between the nodes. 


5 Case Study: Loan Application 

The event log that we use in this work is taken from a Dutch Financial Institut^] The event log represents 
a loan application belonging to a global financial organization, in which a customer requests a certain 
amount of money. The process is composed of three different sub processes. The first letter of each 
task corresponds to an identifier of the sub process it belongs to. The tasks that start with letter A_ 
correspond to states of the application. The tasks that start with letter 0_ correspond to offers belonging 
to the application. And the tasks that start with letter W- correspond to the work item belonging to the 
application. 

The general scenario is as follows. There is a webpage that enables the submission of loan applica¬ 
tions. A customer selects a certain amount of money and then submits his request. Then, the application 
performs some automatic tasks and checks if an application is eligible. If it is eligible, then the customer 
is sent an offer by mail. After this offer is received, it will be evaluated. In case of any missing informa¬ 
tion, the offer goes back to the client and is again evaluated until all the required information is gathered. 
A final evaluation is done to the application. Finally, the application is approved and activated. 

The log contains 262200 events and 13087 cases. The statistics of the log of events is summarised 
in Table [U 

5.1 Converting the Log of Events into a Samlam Bayesian Nework 

In this work, a Java program was made that received as input the log of events in csv format and returned 
a Bayesian Network in a special file format that can be readable by the Samlam toolkit. The program 
parsed every line of the log of events and grouped all activities that were complete and that belonged to 
the same instance (had the same caseld). The program automatically created a graph in a matrix form 
representation and computed the frequency of the connections between nodes. 

Given this matrix form graph representation, another Java program was made in order to convert this 

2 http://www.win.tue.nl/bpi/2012/challenge 
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Event 

Num Occurrences 

Event 

Num Occurrences 

A_SUBMITTED 

13 087 

WJMabellen incomplete dossiers 

11 407 

A_PARTLYSUBMITTED 

13 087 

W.Valideren aanvraag 

7 895 

A_PREACCEPT 

7 367 

W Alhandelen leads 

5898 

A_CANCELLED 

2 807 

W Beoordelen fraude 

270 

A_APPROVED 

2 246 

W.Wijzigen contractgegevens 

0 

A_REGISTERED 

2 246 

0_ACCEPTED 

2 243 

A .ACTIVATED 

2 246 

O.SELECTED 

7 030 

A.DECLINED 

7 635 

O .CREATED 

7 030 

A FTNAT T7.ED 

5 015 

O.SENT 

7 030 

A.ACCEPTED 

5 113 

O.SENT.BACK 

3 454 

W.Completeren aanvraag 

23 967 

O.CANCELLED 

3 655 

W_Nabellen offertes 

22 976 

O .DECLINED 

802 


Table 1: Summary of the statistics of the Loan Application event log. Only COMPLETE events were 

taken into account. 


matrix into a network file recognized by Samlam. Figures [28] and [29] show an example of a network file 
readable by Samlam. This example shows a network of the following form: C f— A —>► B. 


node A 


net 

propagationenginegenerator17919440481468381Z6L = "edu.ucla.belief.approx.BeliefPropagation5ettings§4497ac 
huginenginegenerator30616560386503Z5130L = "edu.ucla.belief.inference.JoinTreeSettings@41a443cb"; 
recove ryenginege ne rator6944530Z674701135Z 81 = "edu.ucla.util.Settingslmpl§>75 ee618”; 
node_size = (130 55); 

}- 

node B- 
{- 

states = (''true" "false" ); 
position = (ZZ9 -ZZ6); 
excludepolicy = "include whole CPT"; 

► ismapvariable = "false";- 
ID = "variablel"; 

► label = "B"; 

DSLxSUBMODEL = "Root Submodel";- 
diagnosistype = "AUXILIARY";- 

}- 

node C- 

states = ("true" "false" ); 
position = (505 -ZZ6); 
excludepolicy = "include whole CPT";- 

► ismapvariable = "false”; 

ID = "variableZ"; 
label = "C";- 

DSLxSUBMODEL = "Root Submodel";- 
diagnosistype = "AUXILIARY”;- 


{- 

states = O'true" "false' 1 );-. 
position = (364 -135);-. 
excludepolicy = "'include whole CPT 1 ;-. 
ismapvariable = "false";-. 

ID = “variableB";-. 
label = "A";-. 

DSLxSUBMODEL = "Root Submodel' 1 ;- 
diagnosistype = "AUXILIARY"; 

potential ( B 1 A }- 
{- 

data = (( 6.Z 0.3 )- 

► C 0.4 0-6 ));- 

potential (CIA )- 
{- 

data = CC 6-6 0.4 )- 
(; 0.3 0.7O)|- 

}- 

potential (A I )-. 

{- 

data = ( 0.7 f 0.3 );- 

}- 


> 


Figure 28: Example of a Samlam network file. 


Figure 29: Example of a 
Samlam network file. 


In a first attempt, we mapped the entire log of events into a Bayesian Network. However, the full 
log contained many tasks (about 24 random variables) and turned the process too big and complex to 
analyse. Figure [30] shows the network directly extracted from the log of events. The cycles that are 
present in this network were already expected, since the log of events contain many events that require 
cycles. Later in this work, we will specify an heuristic to remove such cyclic structures and turn any 
network into an acyclic directed graph. 
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J_Valideren_aanv.. j 



Figure 30: Full representation of the Loan Application Bayesian Network. 


Since the network in Figure [30] was too complex, we decided to choose only the nodes concerned 
with the A_ tasks of the log of events, just like it was done in the works of ( [Adriansyah & Buijs||2012 
Bautista et al.|2012[|Bose & van der Aalst|2012[[Kang et al.|2012| ). 

The resulting Bayesian Network was smaller, containing only 10 random variables. We then altered 
the Bayesian Network in order to add mutually exclusive relationships between the nodes A_DECLINED 
and A_CANCELLED and between the nodes A_APPROVED, A_DECLINED and A_CANCELLED. 

The mutually exclusive relation between the nodes A_DECLINED and A_CANCELLED is straight¬ 
forward. A loan application cannot be both declined and cancelled. Additionally, if an application is 
known to be declined, then the probability of being cancelled will be zero and vice-versa. Figure [31] 
presents this network and Figures [32] to [34] illustrate the mutual exclusion between nodes. 

In order to compare our model with other works in the literature, we also created a Markov Chain 
from the same log of events (Section [2]). We then computed the probability of each sequence of the 
test set occurring in the Bayesian Network and in the Markov Chain and then compared the results. 
Section [53] presents the main outcomes of these experiments. 
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Figure 31: Bayesian Network representation of the loan application. Only A_ nodes were taken into 
account. Manually added mutually exclusive relationships between nodes A_DECLINED and 
A_CANCELLED and between nodes A_APPROVED, A_DECLINED and A_CANCELLED. 


5.2 Converting the Log of Events into a Markov Chain 

As already mentioned, we also developed a Markov Chain by a script in Python with the same training set 
used to generate the Bayesian Network. The transition probabilities of the Markov Chain were computed 
by simply counting the number of occurrences of each sequence of events and then by normalizing to 
obtain a probability value. Figure [35] shows the computed Markov Network. 
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Figure 32: Mutual exclusion between nodes A_DECLINED, A.CANCELLED and A_APPROVED. 



Figure 33: Mutual exclusion between 
nodes A-DECLINED, A_CANCELLED 
and A_APPROVED. 


Figure 34: Mutual exclusion between 
nodes A_DECLINED, A_CANCELLED 
and A .APPROVED. 


5.3 Results 

After defining the structure of the Bayesian Network for the loan application, we generated a training 
set in which we randomly selected 70% of cases in the event log as training set and then, we used the 
remaining 30% as a test set to validate our model. 

The training set was given as input to Samlam in order to learn the conditional probability tables. 
Then, to test the application, a MatLab program was developed in order to perform probabilistic in¬ 
ferences. Basically, the MatLab program received as input the Samlam’s network file and returned a 
Bayesian Network structure. From this program, we were able to compute full joint probability distri¬ 
butions and marginal probabilities. Another Java program received as input the test set and was able 
to validate the model. The validation was performed as follows: we computed the probability of some 
events occurring in the test set and then we compared this value with the probability given in the trained 
Bayesian Network. Tables [2] to [8] show the results obtained for different queries both in the test set and 
the training set. 
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Figure 35: Markov Chain representation of the loan application. 


The overall results show that the Bayesian Network learned from the log of events is a good approach 
for process mining, since the errors obtained were very low. The most significant errors come associated 
with the node A_CANCELLED. For instance, in Table[5} the probability Pr( A_CANCELLED = present 
| A_PREACCEPT ) achieved an error of 17%. One possible explanation can be given by the mutual 
exclusivities that were given to this node. Since in Bayesian Networks, all nodes depend of each other, 
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Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_CANCELLED = present) 

0.0000 

0.0000 

0.0000 

Pr( A_ACTIVATED = present) 

0.0000 

0.0000 

0.0000 

Pr( A_ACCEPTED = present) 

0.1063 

0.1099 

0.3592 

Pr( A_FINALIZED = present) 

0.1010 

0.1067 

0.5685 

Pr( A_PREACCEPT = present) 

0.2307 

0.2595 

2.8799 

Pr( A_SUBMITTED = present) 

1.0000 

1.0000 

0.0000 


Table 2: Results obtain when the node A_DECLINED = present was observed. 


Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) 

0.0000 

0.0000 

0.0000 

Pr( ACTIVATED = present) 

0.0000 

0.0000 

0.0000 

Pr( A_ACCEPTED = present) 

0.6098 

0.6857 

7.5916 

Pr( A .FINALIZED = present) 

0.5882 

0.6567 

6.8532 

Pr( A PREACCEPT = present) 

1.0000 

1.0000 

0.0000 

Pr( A SUBMITTED = present) 

1.0000 

1.0000 

0.0000 


Table 3: Results obtain when the node A_CANCELLED = present was observed. 


Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) 

0.5773 

0.5860 

0.8715 

Pr( A_ACTIVATED = present) 

0.1719 

0.1715 

0.0387 

Pr( A_ACCEPTED = present) 

0.3911 

0.3905 

0.0638 

Pr( A_FINALIZED = present) 

0.3830 

0.3833 

0.0310 

Pr( A_PREACCEPT = present) 

0.5559 

0.5659 

1.0005 

Pr( A_CANCELLED = present) 

0.2238 

0.1315 

9.2335 


Table 4: Results obtain when the node A_SUBMITTED = present was observed. 


Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) 

0.2396 

0.2687 

2.9121 

Pr( A_ACTIVATED = present) 

0.3092 

0.3030 

0.6208 

Pr( A_ACCEPTED = present) 

0.7036 

0.6900 

1.3619 

Pr( A_FINALIZED = present) 

0.6890 

0.6773 

1.1660 

Pr( A_SUBMITTED = present) 

1.0000 

1.0000 

0.0000 

Pr( A_CANCELLED = present) 

0.4027 

0.2324 

17.0257 


Table 5: Results obtain when the node A_PREACCEPT = present was observed. 


Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) 

0.1523 

0.1632 

1.0939 

Pr( A_ACTIVATED = present) 

0.4488 

0.4475 

0.1303 

Pr( A_ACCEPTED = present) 

1.0000 

1.0000 

0.0000 

Pr( A_PREACCEPT = present) 

1.0000 

1.0000 

0.0000 

Pr( A_SUBMITTED = present) 

1.0000 

1.0000 

0.0000 

Pr( A_CANCELLED = present) 

0.3438 

0.2254 

11.8350 


Table 6: Results obtain when the node A_FINALIZED = present was observed. 
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Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) 

0.1569 

0.1649 

0.7999 

Pr( A_ACTIVATED = present) 

0.4395 

0.4392 

0.0253 

Pr( A .FINALIZED = present) 

0.9792 

0.9815 

0.2333 

Pr( A .PREACCEPT = present) 

1.0000 

1.0000 

0.0000 

Pr( A.SUBMITTED = present) 

1.0000 

1.0000 

0.0000 

Pr( A.CANCELLED = present) 

0.3490 

0.2310 

11.7958 


Table 7: Results obtain when the node A_ACCEPTED = present was observed. 


Probability 

Test Set 

Training Set 

ERROR % 

Pr( A_DECLINED = present) = 

0.0000 

0.0000 

0.0000 

Pr( A_ACCEPTED = present) = 

1.0000 

1.0000 

0.0000 

Pr( A .FINALIZED = present) = 

1.0000 

1.0000 

0.0000 

Pr( A_PREACCEPT = present ) = 

1.0000 

1.0000 

0.0000 

Pr( A_SUBMITTED = present) = 

1.0000 

1.0000 

0.0000 

Pr( A CANCELLED = present) = 

0.0000 

0.0000 

0.0000 


Table 8: Results obtain when the node A_ACTIVATED = present was observed. 


then by adding new relationships to the nodes, we are introducing some non-trivial effects in the model. 

Another experiment made was to compare the proposed Bayesian Network with a Markov Chain. 
We trained a Markov Chain in the same way we did for the Bayesian Network. 

In order to validate both approaches, we leveraged on the test set and computed the probability of 
each sequence occurring in a Bayesian Network and in a Markov Chain. In the end, those probabilities 
were weighted with the number of occurrences of each sequence in the test set. The results obtained are 
discriminated in Table ITOl 

Table [lO] shows that the probabilities computed in a Bayesian Network are almost identical to the 
ones computed by the Markov Chain. Individually, the probabilities of computing the sequences in the 
test set did not have an error percentage superior to 4.13%, which is statistically insignificant given 
the total amount of data tested. Moreover, the overall error percentage between the proposed Bayesian 
Network and the Markov Chain was around 1.2674%, which is also statistically insignificant. This 


Processes 

Process Encoding 

Processes 

Process Encoding 

A_SUBMITTED 

A_SUB 

A_APPROVED 

A^\PPR 

A_PARTLYSUBMITTED 

A_PART 

A_REGISTERED 

A_REG 

A_PREACCEPT 

A_PRE 

A_ACTIVATED 

A^ACT 

A_ACCEPTED 

A_ACC 

A_DECLINED 

A_DEC 

A FINALIZED 

A FIN 

A CANCELLED 

A CAN 


Table 9: Encodings of the nodes used in the Bayesian Network. 
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Chain 

Occ. Test Set 

BN 

MC 

ERROR % 

A_SUB -4 A_PART -4 APRE 

22 

0.046426 

0.0051 

4.13 % 

ASUB -4 A_PART -4 A PRE -4 AJlCC 

1 

0.00154 

0.0002 

0.13 % 

A_SUB -4 A_PART -4 A PRE -4 A_ACC -4 A_FIN 

83 

0.0627 

0.0266 

3.61 % 

A SUB -4 A_PART -4 ADEC 

1744 

0.433843 

0.4340 

0.01% 

A SUB -4 A_PART -4 A PRE -4 A_DEC 

282 

0.046369 

0.0877 

4.13 % 

A SUB -4 A JART -4 A PRE -4 A^ACC -4 A DEC 

12 

0.000534 

0.0019 

0.13 % 

ASUB -4 A_PART -4 APRE -4 AJlCC -4 A JIN -4 ADEC 

229 

0.02363 

0.0626 

3.90 

A_SUB -4 A_PART -4 A PRE -4 A_CAN 

343 

0.041347 

0.0826 

4.13 % 

A SUB -4 A_PART -4 A PRE -4 AJlCC -4 A CAN 

19 

0.003809 

0.0051 

0.13 % 

A_SUB -4 A_PART -4 APRE -4 A_ACC -4 A JIN -4 A_CAN 

517 

0.0864 

0.1226 

3.62 % 

A SUB -4 A JART -4 A PRE -4 AJlCC -4 A FIN -4 A_APPR -4 A_REG -4 A^ACT 

675 

0.1715 

0.1715 

0.0000 % 

Total 


0.2435 

0.2561 

1.2674 % 


Table 10: Comparison of a Bayesian Network (BN) and a Markov Chain (MC) for process mining. The 
Error % was computed in the following way: \BN — MC\ * 100 


means that the Bayesian Networks have a similar performance as a Markov Chain. Consequently, one 
can conclude that Bayesian Networks are also good approaches to model business processes, with the 
advantage of being able to represent uncertainty (computing probabilities of tasks that we do not know 
if occurred). 

5.4 Queries 

As already mentioned, one of the capabilities of Bayesian Networks for process mining is their ability to 
deal with uncertainty. They enable the analysis of tasks that are not known to occur. For instance, for the 
Loan Application Bayesian Network, one can be interested in analyzing the probability of the business 
process ending successfully by only knowing that a couple of tasks were observed to occur. Combining 
this ability with Samlam’s graphical capabilities will enable a fast analysis of business processes as well 
as risk management. 

Figure[36]shows the probabilities of some nodes of the Loan Application Bayesian Network, when it 
is only known that the application was declined, that is, the node A_DECLINED was observed to occur. 
From this analysis, one can conclude that the majority of the applications that are declined have a high 
probability of reaching the state A_PREACCEPT. Moreover, if an application is declined, then the nodes 
A_ACTIVATED and A.CANCELLED are never reached. 

Another example is given by Figure[37] When it is known that the application ended up in a cancelled 
state, then one can estimate with a 100% probability that the process reached the task A_PREACCEPT 
and never reached the tasks A_DECLINED and A_ACTIVATED. Moreover, there is a high probability 
that the application was cancelled during the tasks A_ACCEPTED and A_FINALIZED. 

The maximum uncertainty in the loan application business process is given when one only knows 
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Pr( A_SUBMITTED = present | A DECLINED = present ] 


1 


Pr( A_PREACCEPTED = present | A_DECLINED = present ) 0,2595 

Pr( A_FINALIZED = present | ADECLINED = present ) 0,1067 

Pr( A_ACCEPTED = present | AJDECLINED = present } 0,1099 

Pr( A_ ACTIVATED = present | AJDECLINED = present } 0 

Pr( A_CANCELLED = present | A_DECLINED = present } 0 

Figure 36: Analysis of the probabilities of reaching some nodes of the Loan Application Bayesian 
Network, when it is known that the application was declined. 


Pr{ A_5UBMITTED = present | A_C AN CELLED = present) 
PdA_PREACCEPTED = present | A_CANCELLED = present) 
Pr( A_FINALIZED = present | A_CAN CELLED = present) 
Pr( A_ACCEPTED = present | A_CANCELLED = present ) 
Pr( A_ACTIVATED = present | A_C AN CELLED = present) 
Pr( A_DECLINED = present | A_CANCELLED = present ) 


1 

0,6567 

0,6857 

0 

0 


Figure 37: Analysis of the probabilities of reaching some nodes of the Loan Application Bayesian 
Network, when it is known that the application was cancelled. 


that the process was started, which happens when the task A_SUBMITTED is observed to occur. In this 
situation, the proposed Bayesian Network estimates that there is a high probability of the process going 
to the task A_PREACCEPT or being declined (A_DECLINED). If one chooses task A_PREACCEPT, 
then from Figure [39] one can conclude that there is a high probability that the process will be either 
accepted or finalised. 


Pr[ A_C AN CELLED = present | A_SUBIV1ITTED = present) 
Pr[ A_P RE ACCEPTED = present | A^SUBMITTED = present) 
Pr( A_FINALI2ED = present | A_SUBMITTED = present) 
Pr[ AJVCCEPTED = present | A^SUBMITTED = present) 
Pr( A_ACTIVATED = present | A_SUBMITTED = present) 
Pr{ A_DECLINED = present | A_SUBMITTED = present } 


0,1315 


0,1719 


0,5659 


0,3833 
, 0,3905 


0,586 


Figure 38: Analysis of the probabilities of reaching some nodes of the Loan Application Bayesian 
Network, when it is known that the application was submitted. 
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Pf( A_CAN CELLED = present | A_PREACCEPTED = present) 


0,2324 


Pr( A_SUBMITTED = present | A_PREACCEPTED = present) 


1 


Pr[ A_FINALiZED = present | A_PREACCEPTED = present) 


0,6773 


Pr( A = ACCEPTED = present [ A^PREACCEPTED = present) 


0,69 


Pr( A_ ACTIVATED = present | A_P RE ACCEPTED = present) 


0,303 


Pr(A_DECUNED = present | A_PREACCEPTED = present ) 


0,2687 


Figure 39: Analysis of the probabilities of reaching some nodes of the Loan Application Bayesian 
Network, when it is known that the application was pre accepted. 

6 Conclusion and Future Work 

In this work, we propose the usage of Bayesian Networks as a new approach to represent business 
processes automatically extracted from event logs. 

In a first step, we extracted the relationships between nodes from the log of events and then used this 
log to train and validate the proposed Bayesian Network. 

Experiments made over a Loan Application Case study suggest that Bayesian Networks have the 
same performance as Markov Chains, so they are good models to make accurate predictions about 
sequences of events in the scope of process mining. 

Moreover, by modelling a business process through Bayesian Networks, one is able to take advan¬ 
tage of the ability of these structures to deal with uncertainty. More specifically, Bayesian Networks 
enable the reconstruction of a flow by only taking into account partial observations in the business pro¬ 
cess. 

As for future work, it would be interesting to extend the capabilities of Bayesian Networks to learn 
from incomplete logs of events. One could train such network using the EM Clustering in order to find an 
approximate probability distribution for the occurrence of the tasks. Moreover, together with Samlam, 
one could try to estimate the most probable sequences of business processes using the probabilities 
learned from the incomplete log. 
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